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Reliable and More Powerful Methods for Power 
Analysis in Structural Equation Modeling 

Ke-Hai Yuan , 1 2 Zhiyong Zhang , 2 and Yanyun Zhao 1 

l Renmin University of China 
2 University of Notre Dame 


The normal-distribution-based likelihood ratio statistic T m j = nF m i is widely used for power 
analysis in structural Equation modeling (SEM). In such an analysis, power and sample size 
are computed by assuming that T m i follows a central chi-square distribution under Hq and a 
noncentral chi-square distribution under H„. However, with either violation of normality or 
not a large enough sample size, both empirical and analytical results indicate that the chi- 
square distribution assumptions are not realistic and consequently methods of power analysis 
based on such assumptions are not valid. This article describes a Monte Carlo (MC) method 
for power analysis. A measure of effect size for characterizing the power property of 
different rescaled statistics is also provided. Robust methods are proposed to increase the 
power of T m i and other statistics. Simulation results show that the MC method reliably 
controls Type I errors and robust estimation methods effectively increase the power, and their 
combination is thus recommended for conducting power analysis in SEM. 

Keywords: likelihood ratio statistic, Monte Carlo, power, robust estimation, Type I errors 


Measurements in social and health sciences typically con¬ 
tain errors. By separating measurement errors from true 
scores or latent constructs, structural equation modeling 
(SEM) has become a major research methodology in 
many areas where the focus is on the relationships among 
the latent constructs as well as those between the latent 
constructs and their observed indicators. Like any statistical 
method, inference in SEM might suffer from Type I or 
Type II errors. Proper methods are needed in applications 
to minimize those errors. In particular, a power value itself 
does not have much meaning if the testing procedure can¬ 
not control Type I errors. Various developments have been 
made for power analysis in SEM. Most of them assume that 
the involved statistic follows a central chi-square distribu¬ 
tion under the null hypothesis Hq, and a noncentral chi- 
square distribution under an alternative hypothesis H a . The 
purpose of this article is to present methods for power 
analysis that do not rely on unrealistic chi-square distribu¬ 
tion assumptions. In addition, we also discuss the 
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relationship between the power of a statistical test and the 
underlying population distribution of the sample with 
respect to effect size in SEM, and propose to use robust 
methods for more powerful tests. A Monte Carlo (MC) 
study is also conducted to evaluate the proposed methods. 

One of the earliest developments for power analysis in 
SEM was given by Satorra and Saris (1985). They consid¬ 
ered the normal-distribution-based likelihood ratio statistic 
T m , and assumed that {T ml \H 0 )~x 2 df and {T m i\H a )~x 2 df (r\), 
where df is the nominal degrees of freedom, and the non¬ 
centrality parameter (ncp) q is obtained by fitting the pro¬ 
posed model to the population covariance matrix generated 
by a misspecified model under H a . The approach to power 
analysis in MacCallum, Browne, and Sugawara (1996) is 
also based on T m i following a central chi-square distribu¬ 
tion under Hq and a noncentral chi-square distribution 
under H a , but the ncp q in their approach is specified via 
a hypothetical value of the root mean square error of 
approximation (RMSEA; Steiger & Lind, 1980) rather 
than generated by a specific misspecified model. Power 
analysis based on T m i following and X^/(fi) were further 
extended to incomplete data by Dolan, van der Sluis, and 
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Grasman (2005) and Davey and Savla (2009). However, 
because normally distributed data are as rare as unicorns 
(Micceri, 1989), procedures based on T m i following a chi- 
square distribution might not be valid with either complete 
or incomplete data in practice. In particular, with a non- 
normally distributed population, T m i could reject a correct 
model 100% at the nominal level of 5% (e.g., Hu, Bentler 
& Kano, 1992). Actually, even for normally distributed data 
with a correctly specified model, T m \ might reject the cor¬ 
rect model close to 100% at the nominal level of 5% when 
p (the number of variables) is large and N (the number of 
observations) is not large enough (Moshagen, 2012). Thus, 
more reliable methods for power analysis are needed for 
both normally and non-normally distributed data. 

Satorra (2003) studied the power properties of three test 
statistics: the likelihood ratio statistic T m i, the rescaled 
statistic T rm i (Satorra & Bentler, 1994), and the residual- 
based asymptotically distribution free statistic T mc if 
(Browne, 1984). Each statistic is assumed to follow a 
central chi-square distribution under Hq and a noncentral 
chi-square distribution under H a . Limited MC results in 
Satorra (2003) indicated that the empirical distribution of 
T nn i is reasonably approximated by its asymptotic distribu¬ 
tion under certain conditions but not always, and the dis¬ 
tribution of T ra df is markedly different from its nominal 
asymptotic chi-square distribution at smaller N. The distri¬ 
butions of T m df and T rm i under Hq were also examined in 
Nevitt and Hancock (2004), Yuan and Bentler (1998a), and 
Bentler and Yuan (1999). The results indicate that the 
nominal chi-square distribution cannot describe the 
empirical behavior of T rm l well unless the model structure 
and the population fourth-order moments satisfy certain 
conditions (Yuan & Bentler, 1999) together with a large 
enough sample size N; and T m df deviates significantly from 
Xdf unless N is very large. Thus, statistics designed to 
account for non-normality are not good candidates for 
conducting power analysis in practice when they are com¬ 
pared against the nominal chi-square distributions. 

Muthen and Muthen (2002), Schoemann, Miller, 
Pomprasertmanit, and Wu. (2014), and Zhang (2014) pro¬ 
posed using MC methods to evaluate power in testing the 
hypotheses on population parameters in SEM models (see 
also Wolf, Hamington, Clark, & Miller, 2013). A key ele¬ 
ment in their proposals is to estimate the power through 
MC simulation. The model is estimated for each sample 
generated under a given population distribution, and the z 
score or Wald statistic for the parameter under the hypoth¬ 
esis is then compared against N(0, 1) or a central chi- 
square distribution, yielding either a rejection or no rejec¬ 
tion. The power is estimated as the proportion of rejections 
across many replications. The methods would be sound if 
the z or Wald statistic literally follows N( 0,1) or the central 
chi-square distribution under the null hypothesis. However, 
empirical results indicate that standard errors (SEs) 


obtained via the information matrix, sandwich-type covar¬ 
iance matrix, or other consistent formula can be a lot 
smaller than the corresponding empirical SEs when the 
sample size is not large enough, especially when the under¬ 
lying population distribution is of heavy tails (see, e.g., 
MacKinnon & White, 1985; Yuan & Bentler, 1997b). 
Also, sandwich-type SEs as given in standard programs 
might not be consistent (Yuan & Hayashi, 2006) unless 
the model is correctly specified. The idea of estimating 
the power by comparing the values of z score against 
N(0, 1) parallels that in Yung and Bentler (1996), who 
proposed estimating power for testing the overall structure 
of an SEM model by comparing (T m i\H a ) against the nom¬ 
inal chi-square distribution, which would be a sound 
method if (T m i\H 0 )~Xdf However, many conditions can 
render the TV (0,1) or Xdf invalid, including non-normally 
distributed population, not large enough sample size, or 
incomplete data. 

A power evaluation approach that does not rely on the 
assumptions of (T\H 0 )~y^ f or (T\H a )~x 2 df (r\) was given 
by Yuan and Hayashi (2003), where the critical value 
corresponding to the distribution of (T\Hq) is estimated 
by the method of bootstrap, and power is also estimated 
by the bootstrap methodology in which the value of (T\H a ) 
is estimated at each replication and compared against the 
estimated critical value. Because the bootstrap methodol¬ 
ogy is used, this approach is conditional on a given sample. 
Yuan and Hayashi (2003) described this approach using a 
real complete data set. In this article, we extend this 
approach to power analysis by MC simulation when a 
sample is not available, and call it the MC method. The 
MC method resembles the methodology of parametric boot¬ 
strap (Efron & Tibshirani, 1993, Section 6.5), and it allows 
us to include features of the expected population distribu¬ 
tion to be studied. But we do not need to know the exact 
form of the distribution of the target population. In parti¬ 
cular, for any statistic T with Type I error and power 
defined by 

a = P(T > c\- a \Ho) and power = P(T > c\- a \H a ), 

we estimate the critical value c i_„ and the value of power 
using MC simulation. Thus, we do not need to assume a 
distribution on either (T\Hq) or (T\H a ) and can still obtain a 
consistent estimate of power by controlling Type I error at 
level a. We conduct a simulation study of the MC method 
and contrast different approaches to power analysis in SEM. 

The true power of a test statistic 1 is defined once the 
model, sample size, and a target population are given. The 


1 A test statistic is a numerical value aiming to optimally summarize 
the deviance in the data against the null hypothesis, and the statistic 
typically depends on the values of parameters computed by a particular 

estimation method (see, e.g., https://en.wikipedia.org/wiki/Test_statistic). 
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MC method allows us to consistently estimate the true 
power. However, the true power varies when a different 
statistic or an alternative estimation method is chosen. 
Methods for power analyses in the literature are mostly 
based on the analysis of sample means and covariances 
by assuming data are from normally distributed popula¬ 
tions. For example, the t, and F statistics for analysis of 
vanance (ANOVA) and regression, the Fisher z statistic for 
the product-moment correlation, and the statistic T m j for 
SEM are all based on the normality assumption, which is 
idealized rather than realistic (Micceri, 1989). When the 
underlying population distribution is of heavy tails, inaccu¬ 
racy in power estimation for testing hypotheses on means 
mostly occurs at smaller sample sizes, because the t and F 
statistics are asymptotically robust to distribution viola¬ 
tions. However, T m i or other test statistics in the context 
of SEM could suffer from severe loss of true power if the 
underlying population distribution is of heavy tails. As a 
matter of fact, regardless of how wrong a model is, T m i or 
the Fisher z statistic at a given sample size might have a 
power close to zero. Because real data tend to have heavier 
tails than that of the normal distribution, we propose to 
apply robust methods for achieving better power. We also 
use MC simulation to compare the power of test statistics 
following a robust method against that under the analysis of 
sample variances-covariances. 

There are many test statistics in SEM, and we mainly 
study the power of T m i and T rm i, but the methods and ideas 
described here can be equally applied to other statistics as 
well as to evaluating the power for testing parameters. We 
discuss power properties of T m i and T rm i under different 
estimation methods 2 in the following section. A simulation 
study evaluating the validity of the MC method and compar¬ 
ing the power properties of different statistics is presented in 
the following section. Conclusions, discussion, and recom¬ 
mendations are provided in the concluding section. 

Power analysis might be performed in the planning stage 
of a project before any data are collected or when data are 
partially or completely collected. The analysis in the plan¬ 
ning stage is mostly for predicting the minimum sample 
size needed to achieve desired results, and is called proac¬ 
tive, whereas analysis after data were collected is called 
reactive and can also be useful (see Marcoulides & Chin, 
2013). Although the focus of this article is the proactive 
approach for power analysis, our emphasis is on methods to 
control Type I error and increase statistical power. The 
procedures described are equally applicable to power ana¬ 
lysis reactively. In the concluding section, we also discuss 
how to use the information from existing data sets in the 
planning stage of a related or a new project. 


2 As the estimation method varies, the properties of T m i and T rm i also 
vary, and we label the two statistics with additional notation when they are 
evaluated under robust methods. 


TEST STATISTICS, ROBUST METHODS, EFFECT 
SIZE, AND THE MONTE CARLO METHOD 

In this section we first discuss properties of T m i and T rm i 
defined via the method of normal-distribution-based max¬ 
imum likelihood (NML). Power loss of the two statistics 
when the population distribution is of heavy tails is 
described next. We then introduce robust methods and 
discuss power properties of T m / and T rm i defined via robust 
estimates. The concept of effect size is also introduced 
and compared across different estimation methods. 
Methods for handling incomplete data and how T m i and 
Trmi are computed will also be reviewed, to set up the 
context for MC study with missing data in the following 
section. Because, beyond the class of elliptical distribu¬ 
tions, the distribution of T,„i or T rm i does not have a 
simple form even asymptotically, we mostly refer to ellip¬ 
tical distributions when discussing the effect of heavy¬ 
tailed distribution on the power of the two statistics in 
this section. Our MC study in the following section 
includes both symmetric and skewed popidation 
distributions. 


Test Statistics T ml and T rml 

Many test statistics have been developed in the SEM 
literature (see, e.g., Yuan & Bentler, 2007). The most 
widely used one is the likelihood ratio statistic T m i derived 
from NML. Let xi, \ 2 , ■ • •, x v be a random sample from 
a variate population represented by x with E{\) = // 
and Cov(x) = £; S be the sample covariance matrix; and 
£( 0) be the structural model. The NML method for SEM 
is commonly presented via the ML discrepancy function 

F,ni{ S,E( 0)) =tr(SE- 1 ( 0)) — log |SE _1 ( 0)\ -p. (1) 

Let 0 be the value of 0 that minimizes F,„i( S,£( 0)), 
then T m i = nF m i( S,E( 0)), where n = N — 1. Under the 
normality assumption and FI o:£ = E( 0), T m i converges 
to the nominal chi-square distribution yp d j as N —> oo. 
However, T m i does not approach “A- when the normality 
assumption fails to hold. When data are elliptically distrib¬ 
uted with relative kurtosis 

yg = £{[(x- /<)'2 _1 (x- //)] 2 }/>0 + 2)], (2) 

T m i converges to /?)Q/ under Ho. This implies that Type 1 
error rate of statistical inference, according to T m i~x d f, is 
directly related to the value of [i. When /? is large enough, 
we reject a correct model 100% by comparing T m \ against 
the critical value of ~Qf although the aim is at 5%. For 
example, if the population follows a multivariate /-distribu¬ 
tion with degrees of freedom m (i.e., x~t(m , //,£)), then 
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its relative kurtosis is given by P = (m — 2 )/{m — 4). Thus, 
if the population distribution follows t{m 1 //,£), the 
statistic T m i~y} d j- will reject the correct model 100% when 
m is either less than 4 or greater than but close to 4. In such 
a case, results of power analysis based on (T m i\Ho)~Xdf 
and (T m i\H a )~x d f(r\) are misleading because Type I error 
and power are confounded. A larger value of the p in 
Equation 2 is reflected by heavier tails in the observed 
data, and can be measured by multivariate kurtosis 
(Bentler, 2006, p. 106; Cain, Zhang, & Yuan, in press; 
Mardia, 1970.) 

More generally, when the popidation x follows a distri¬ 
bution with finite fourth-order moments, 

df 

T m i = ^Pjz} +° P (1), ( 3 ) 

7=1 

where o p (l) is a term that approaches zero in probability, 
the values of Pj, j= 1 , 2, ..., df are determined by the 
model structure and the population fourth-order moments 
of x; and zj are independent and each follows the standard 
normal distribution N( 0,1). Thus, the asymptotic mean of 
T,„i is given by Ylf-\ Pj- Consequently, T m j tends to reject 
correct models more often when Ylj= l Pj is greater than df. 
Again, results of power analysis based on (T m i\Ho)—yf 
and (T m i\H a )~x 2 df are misleading. 

Let f be a consistent estimator of the average 
f = Jfj'Li Pj/df, then the rescaled statistic T m i = p~ x T m i con¬ 
verges to a distribution with mean equal to df. Consequently, 
T rm i is better approximated by yf df than T m i when 1.0. 
However, it does not mean that T nn i literally follows a chi- 
square distribution. In particular, when there are substantial 
differences among the P d ,j = 1,2, ...,df, then the distribution 
of Trmi can be far from jQ/ even asymptotically (Yuan & 
Bentler, 1999). Power analysis based on T lm i~yf df can also be 
misleading, especially when the coefficients fj differ substan¬ 
tially or when the sample size is not large enough (Bentler & 
Yuan, 1999; Nevitt & Hancock, 2004; Yuan & Bentler, 1998a). 

Power Loss of T mi and T rml with Heavy-Tailed 
Population Distributions 

Because T m i tends to reject a correct model when the 
underlying population distribution is of heavier tails, we 
might think that T m \ will have a desired power value in 
general. However, the story will change if we control Type 
I errors, because the power of T m i is determined by the 
separation between (T,„i\Ho) and ( T m i\H a ), which further 
depends on the variance of T m i in addition to the size of 
model misspecification. For a give model under //„, the size 
of misspecification is measured by 


F mla =F ml {'L,'L( 0 a )), (4) 

where 6 a is the value of 0 that minimizes 7 7 ))1 /(E,E( 9)). 
Thus, the size of model misspecification is determined once 
the model is given, and the overlap between {T m i\Ho) and 
{T„,i\H a ) is then determined by the distribution of the 
underlying population in addition to sample size. As an 
example, consider x following an elliptical distribution. 
Then the asymptotic mean and variance of ( T m i\Ho ) are 
given by pdf and 2 f 2 df, respectively; and those of (T m /\H a ) 
are given by pdf + q and If 1 df + 4/?q, respectively, where 

d = nF mla (5) 

is no longer the noncentrality parameter because ( T m i\H a ) 
does not follow a noncentral chi-square distribution (see, 
e.g., Shapiro & Browne, 1987). Thus, the mean difference 
between (T m i\H a ) and {T m i\Ho) is affected little by f while 
their standard deviations increase as f> increases. 
Consequently, there is less separation between ( T m i\H a ) 
and ( T m i\Ho) as p increases. The analysis implies that the 
true power of T ml decreases as the kurtosis of the popula¬ 
tion distribution increases. 

Power loss due to heavy-tailed distribution also occurs to 
the rescaled statistic T nn i. Consider again x following an ellip¬ 
tical distribution, then (T rm i\Hf) and (T rm i\H a ) asymptotically 
follow x df and jp d ff\/P), respectively (Shapiro & Browne, 
1987). Consequently, the mean difference between (T nn i\H a ) 
and ( T rm i \Hq ) approximately equals q//?, which goes to zero as 
P increases. Thus, although the asymptotic variance of 
(T mi i\H a ), equal to 2 (df + 2q//?), decreases with /?, there 
will be little separation between (T nn i\Hf) and (T rm i\H a ) if p 
is rather large. The statistic T rm i eventually lacks power in 
detecting any misspecified model as P increases. 

A parameter test in SEM following the NML method 
faces the same issue of power loss. Although sandwich- 
type SEs provide consistent estimates of the variability of 
parameter estimates, the SEs increase as kurtosis increases. 
Consider testing Ho : 9 = 0$ based on the NML estimate 6 
of 8. Assume that the population counterpart of 8 is 8 a ; the 
sandwich-type SE of 8 is estimated as xfJn, and the 
population counterpart of f is t. Then x is proportional to 
under an elliptically distributed population with kurto¬ 
sis p (Tyler, 1983). If we define effect size as \8 a — #o|/t, 
then it decreases to zero as P increases. Thus, a parameter 
test based on sandwich-type covariance matrix will lose its 
power when the population distribution is of heavy tails. 

It is hard to give a precise quantification of power loss of 
T m i and T rm i when the population distribution is not ellip¬ 
tical, as the PjS in Equation 3 are no longer equal in general. 
We would expect that their power values for a given q to 
decrease as the average p of the PjS increases. We use the 
MC method to evaluate their power in the next section. 
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Robust Methods and Statistical Power 

Closely related to statistical power is efficiency of parameter 
estimates. A method yields more efficient parameter esti¬ 
mates also corresponds to higher statistical power in general 
when the estimates are used to construct test statistics. This 
is because the resulting test statistics (T\Hq) and ( T\H a ) will 
have less variation and thus more separation if their mean 
difference remains about the same. It is known that the true 
ML estimates are asymptotically most efficient, and NML 
estimates can be very inefficient when data possess extra 
kurtoses than that of /V( //, 2). In practice, with real data 
typically having heterogeneous marginal kurtoses (e.g., 
Blanca, Amau, Lipez-Montiel, Bono, & Bondayan, 2013; 
Cain et al, in press; Micceri, 1989), it is unlikely for us to 
obtain the true ML estimates. Nevertheless, ML based on a 
distribution that can accommodate heavy tails in real data 
will lead to more efficient parameter estimates in general. A 
class of distributions that can account for heavy tails is 
multivariate /-distributions, and /-distribution-based ML 
(TML) has been shown to yield more reliable results than 
NML when modeling real multivariate data (see, e.g., Lange, 
Little, & Taylor, 1989; Little, 1988; Tong & Zhang, 2012; 
Wilcox, 2012; Yuan & Bender, 1998b). 

A more general method than TML is robust 
M-estimation, which aims to generalize ML methodology 
in general (Huber, 1967). The key idea in robust 
M-estimation is that observations geometrically sitting far 
from the center of the majority of the data are down¬ 
weighted, and different down-weighting schemes correspond 
to different M-estimators. Many weighting functions have 
been proposed according to the size of the Mahalanobis 
distance: 


d 2 i={^i- ft )'2 ! (xi- ft). 

In particular, the weight corresponding to TML is given by 
Wj = w(dj) = ( m+p)/(m + dj), where m is the degrees of 
freedom of the /-distribution and p is the number of vari¬ 
ables. The contribution of x, to the estimation of fi and 2 or 
the structural parameter in TML is proportional to w,. 
Clearly, the weight w, assigned to the /th case becomes 
smaller as d t increases. In particular, once w,- is given, the 
estimates of fi and 2 can be regarded as the weighted 
average of x, and (x, — //)(x j — ft)', respectively. In the 
estimation process, the value of vv, will be assigned auto¬ 
matically and iteratively in the process of estimating ft and 
2 or the parameters in the structural model (Yuan & 
Bentler, 1998b). The degrees of freedom, m, can be either 
chosen or estimated. Unless the population is truly a multi¬ 
variate /-distribution, fixing m at a given value might be 
preferred because the resulting estimation is a lot easier 
(Little, 1988). Also, empirical studies suggested that using 
a weight function to down-weight extreme cases is more 


important than its precise form (Little, 1988; Wilcox, 2012; 
Yuan & Zhong, 2008). 

Another popular weight is the Huber-type weights 
(Huber, 1981). Let r 2 be the quantile of the distribution of 
Xp corresponding to probability (1 — a). The Huber-type 
weights are given by 


w/i = w\(di) 



if dj < r, , 2 / 

if di >r, and ^ = w n /K 


( 6 ) 


where k is a constant such that 7?[Xp w u(Xp)/ K ] = P> with K 
being determined by a so that the resulting estimates of 
means and covariances or the structural parameters are 
consistent when \~N{ ft , 2). Different from weights 
derived from a multivariate /-distribution, weight wn is 
applied to case x,- in estimating the means ft, and weight 
Wj 2 is applied to (x, — //)(x,- — ft)' in estimating the cov¬ 
ariance matrix 2. Under Huber-type weights, only cases 
having d, > r are down-weighted. A larger value of a 
corresponds to a smaller r, and consequently more cases 
are down-weighted. In the estimation process, one only 
needs to choose a value of a and then the weights vv,i and 
Wj 2 as well as k will be adjusted automatically in the 
estimation process (Yuan & Zhong, 2008). 

After robust estimates ft and 2 being obtained, SEM can 
be performed by replacing the S in Equation 1 with 2 and 
proceeding to estimate the structural parameters 6 by 
minimizing the corresponding discrepancy function F m j. 
Similarly, one can also substitute ft for x when a mean 
structure is involved. Yuan and Bentler (1998b) showed 
that the resulting robust estimates 6 will inherit the robust 
properties of 2. In particular, a robust version of the 

(r) 

rescaled statistic, 7j. m/ , can be obtained so that its asympto¬ 
tic mean equals df. Yuan, Chan, and Bentler (2000) also 
showed that the test statistic T^J by treating 2 as S, or a 
robust version of T m j, also more closely follows 2Q/ than 
T m i for typical real data with heavy tails. Because the power 
of a test statistic is closely related to the efficiency of the 
parameter estimates and robust methods tend to yield more 
efficient estimates, both T^j and are expected to have 
better power than T m i and T rm i (see Tyler, 1983; Yuan, 
Bentler, & Chan, 2004; Yuan & Hayashi, 2003). We use 

(r) 

the MC method to evaluate the power properties of T ml and 

(r) 

T m j and contrast them against those of T m \ and T rm /. Again, 
with the MC method, we do not need to make a central or 

(r) 

noncentral chi-square distribution assumption on either TJ 

rp(r) 

0r 

Athough robust M-estimation has been shown to yield 
more reliable results than NML in analyzing real data in 
previous studies, the advantage of a robust method also 
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depends on the population distribution underlying the sam¬ 
ple. If the population distribution is multivariate normal, 
then a robust method might not perform as well as NML 
although the difference is typically minor. Differences 
between NML and robust methods can be characterized 
asymptotically when x follows an elliptical distribution. A 
model E( 6) is called invariant under a constant scaling 
factor (ICSF) if for any parameter vector 6 and a constant 
c there exists an 0 C such that c 2 E( 0) = E( 0 C ). As noted 
in Browne (1984), essentially all interesting models are 
ICSF. Yuan et al. (2004) showed that, for a model that is 

(V) 

ICSF, the rescaled statistic T rnl/ asymptotically follows 
where the value of /? r is determined by the 
estimation method and the underlying elliptical distribution. 
When data are normally distributed and NML is chosen, 
P r = P = 1. Tyler (1983) studied robust M-estimation for 
covariance matrices and examined the properties of a 
rescaled statistic for testing hypothesis H 0 : h(l.) = 0, 
where h(-) is a vector of smooth functions defined over 
the covariance matrix E. For an SEM model that is ICSF, 
the scaling factor used in Tyler (1983) has the same asymp¬ 
totic value as [i r that governs the asymptotic distribution of 

TlH for testing the structural model E = E( 0) (Yuan et al. 
2004). Part of the results of Tyler (1983) for the value of /?,. 
is shown in the Table 1 for us to see the effect of robust 
methods, where the populations are 10-variate /-distribution 
with 1 and 5 degrees of freedom, and nonnal distribution, 
respectively; and the estimation methods are TML based on 
1 and 5 degrees of freedom, robust M-estimation using 
Huber-type weights with the tuning parameter r in 
Equation 6 corresponding to a=.l, .5, and 0 (NML), 
respectively. When the underlying population distribution 
is normal, NML yields most efficient estimates with //. = 1, 
which has the largest ncp possible for the given size of 
model misspecification as measured by the r| in Equation 5. 
Under this ideal condition, the ncp corresponding to Huber 
(.1) is only 1% smaller (1/1.01 = .99) than the q defined in 
Equation 5. When the underlying population is multivariate 
/-distribution with m = 5, TML(5) performs best with /?,. = 
1.13 and the resulting ncp is 12% smaller (1/1.13 = .88) 


TABLE 1 

The Value of /?,. With Three Population Distributions and Five 
Estimation Methods (p = 10, Tyler, 1983) 


Estimation Method 


Population Distribution 

TML(l) 

TML(5) 

Huber(.l) 

Huber(.5) 

NML 

/(l. //,£) 

1.18 

1.28 

1.48 

1.23 

00 

/(5, 

1.17 

1.13 

1.21 

1.15 

3.00 

N{pX) 

1.16 

1.09 

1.01 

1.08 

1.00 


Note. TML= /-distribution-based maximum likelihood; NML=normal-dis- 
tribution-based maximum likelihood. 


than q; Huber(.5) corresponds to = 1.15 and the result¬ 
ing ncp is 87% of q (1/1.15 = .87). In contrast, NML 
corresponds to //. = 3.0, which makes the ncp only 33% 
of the q defined in Equation 5. When the underlying popu¬ 
lation is multivariate /-distribution with m = I degree of 
freedom, NML corresponds to fi r oc and consequently 
T„,i or T rm i becomes powerless regardless of sample size or 
the size of misspecification as measured by q. In contrast, 

T [ 2i following TML(l) corresponds to P r = 1.18, and the 
resulting ncp is still 85% of q (1/1.18 = .85), and Huber 
(.5) corresponds to an ncp that is 81% of q (1/1.23 = .81). 
Note that, with x~/(l, /i,E), ncp = q/1.18 is the largest 
possible value of ncp a method can achieve because, TML 
(1) yields asymptotically most efficient estimates. Athough 
Huber-type weights are not optimal, the ncp under Huber 
(.5) is only 4% (1.18/1.23 = .96) smaller than the largest 
possible value of ncp under one of the worst conditions. 


Effect Size in SEM 

As is well known, the power of a statistical test is closely 
related to sample size and effect size. Effect size in 
ANOVA, regression, or analysis of correlation has been 
well documented, but not in SEM. We give a definition of 
effect size for SEM in this subsection and discuss its value 
by relating it to the material presented in the previous 
subsections. 

For testing Hq : n = 0 based on a sample of size n from 
a normally distributed population N(ju,o 2 ) with a known 
a 2 , the effect size is defined as 5 = /.i/o (assuming /f>0). 
The test statistic z = y/nx/o follows N(y/n8, 1) or equiva¬ 
lently z 2 follows Xi( w <5 2 )’ where «8 2 is the ncp. Thus, the 
effect size is simply (ncp /n) 1 ^ 2 . For SEM with normally 
distributed data, both T m i and T rm i approximately follow 
q). When the population distribution is normal, a direct 
imitation of the effect size from the literature on mean 
comparison leads us to defining effect size as 

E5,„; = (ncp/n) 1/2 = F^ 2 , (7) 

where F m \ a is defined in Equation 4. Note that the effect 
size in Equation 7 is defined at the population level, not 
related to the sample size N = n + 1. 

The material in the previous subsections implies that the 
definition of effect size in Equation 7 is applicable only for 
T m / and T rm i with a normally distributed population. When 
x follows a distribution other than normal, (T m /\H a ) does 
not follow a noncentral chi-square distribution in general, 
and the definition of effect size in Equation 7 is no longer 
valid. When x follows an elliptical distribution with relative 
kurtosis /?, and ( T rm i\Ho ) and (T n „i\H a ) still asymptotically 
follow chi-square distributions, we can define effect size as 
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ES™, = (7W/?) 1/2 , (8) 

which is inversely proportional to the relative kurtosis. 
When the population distribution of x is multivariate t 
with less than 4 degrees of freedom, then the effect size 
in Equation 8, associated with T m /, would be zero regard¬ 
less of the value of F mla . In contrast, the effect size asso- 

(r) 

ciated with T rnil following a robust method would be 

E ^=(fWA) 1/2 - (9) 

As listed in Table 1, even if x~t(l, /<,£), the effect size 
corresponding to Huber(.5) is still 90% ( = 1 /V 1.23) of 
the effect size corresponding to a normally distributed 
population. Thus, effect sizes following a robust method 
can be substantially greater than those following the widely 
used NML with real data typically having heavy tails. 
When the population is not elliptically distributed, T rml or 
does not follow a central or noncentral chi-square dis¬ 
tribution in general. However, the values of the power of 
Trmi and are still inversely affected by the heaviness of 
the tails in the population distribution. Considering that T nn i 
and are obtained by rescaling T m i and T^'J , respectively, 
we can generalize the effect sizes in Equations 8 and 9 to 

E S rml = (F mla //3) 1/2 and ES^ ; = , (10) 

where /? is the average of the /? s in Equation 3 and /?,. is the 
counterpart of corresponding to T ml . Note that the power 

of T rm [ or T^j might only be approximately determined by 
the effect size in Equation 10 and sample size when the two 
statistics do not follow chi-square distributions. However, 
we expect that ]i r corresponding to T^ tl to be much smaller 
than /? corresponding to T rm i under NML, and thus T y ml to 
be a more powerful test statistic in general when the under¬ 
lying population distribution of x is of heavy tails. We use 
MC simulation to study the power properties of T nn i and 
Tmi i n next section. 

Incomplete Data 

We have discussed the issue of statistical power in SEM 

(V) 

with complete data via the test statistics T m i, T n „i, T ml , 
and T^j. Each of the statistics also has a corresponding 
version when a sample contains missing values. In parti¬ 
cular, estimates of L can be obtained using the normal- 
distribution-based expectation-maximization (EM) algo¬ 
rithm (Dempster, Laird & Rubin, 1977; Enders & Peugh, 
2004) or an expectation-robust (ER) algorithm (Little & 
Smith, 1987; Yuan, Chan, & Tian, 2016). The incomplete- 


data version of test statistics T m \ and T^j are obtained 
when the S in Equation 1 is replaced by the corresponding 
NML and robust estimates Y. respectively. The advantage 
of such a two-stage approach is that rescaled versions of 

Tmi and T^j have been developed for data with missing 
values (Yuan & Bentler, 2000; Yuan & Zhang, 2012), and 
they perform well in practice (Tong, Zhang & Yuan, 
2014). We expect that the power of each of the test 
statistics will be inversely affected by heavy-tailed distri¬ 
bution, and will use MC simulation to study them in the 
next section. 

An alternative test statistic is the likelihood ratio sta¬ 
tistic assuming the observed marginal variables of each 
incomplete case follow a normal distribution correspond¬ 
ing to a subset of the p complete variables, called direct 
ML or full information maximum likelihood (F1ML) in 
the SEM literature (Savalei & Bentler, 2009; Savalei & 
Falk, 2014). However, such an obtained statistic cannot 
be expressed in the form of the discrepancy function in 
Equation 1. Consequently, it is not clear how to define 
effect size for such a statistic or its rescaled version. 
Also, with normally distributed data as rare as unicorns 
in practice (Micceri, 1989), direct ML might not have 
any advantage over the two-stage approach. In addition, 
the two-stage approach facilitates the inclusion of aux¬ 
iliary variables, whereas it is not clear how to include 
auxiliary variables in direct ML without affecting the 
evaluation of the model for the substantive variables 
(Savalei & Bentler, 2009; Yuan & Lu, 2008). Thus, 
although our proposed method also applies to statistics 
evaluated following direct ML estimation, we do not 
study the direct likelihood ratio statistic with incomplete 
data in this article. 


The Monte Carlo Method for Estimating Power and 
Sample Size 

Let T be a statistic defined under NML, robust M-estimation, 
or any other estimation method. For a sample of size N drawn 
from a target population under Hq with complete or incom¬ 
plete data, a value of the statistic T is obtained at the end of the 
parameter estimation. We next obtain N r independent replica¬ 
tions of the process of computing T , and then order the 
obtained values of T from small to large, and denote them as 

T (i) < t ( 2) < ■ ■ ■ < T(N r ) ■ 

For a given level a, let the integer part of N r ( 1 — a) be 
denoted as N r ( i_ a ). For example, with a =.05 and 
N r = 1,000, N r (\- a ) = 950. The estimated critical value 
for T under the condition (e.g., sample size, population 
distribution, missing data scheme, etc.) is 
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We then draw an independent sample of the same size 
from the target population under H a , compute the value of 
T under the same estimation method by fitting an incor¬ 
rectly specified model, and compare the obtained value of T 
against ci_ a . The statistic is counted as non-significant if 
T <c\- a and significant otherwise. Repeating the process 
N a times, the percentage of significant T will be the esti¬ 
mated value of power under the MC method. In particular, 
the same process will generate an estimate of Type I error if 
we draw independent samples from the same population as 
that under Hq. 

The value of power by the MC method, denoted as p a , 
might be smaller or greater than the desired value p a ■ If 
Pa<Pa , then increase the sample size N and repeat the 
processes of computing ci_ a and p a . Otherwise, repeat the 
process by decreasing the sample size N. The estimated 
sample size N is obtained when p a is close enough to p a , 
say \p a —p a |<-01. In the implementation of the MC 
method, the sample size might be increased or decreased 
by a step size of 5 initially, and then by a step size of 1 
when p a becomes closer to p a . The iterative procedure of 
estimating N can be automatic and researchers just need to 
provide the desired value of p a , the estimation method, and 
features of the population distribution (e.g., marginal skew¬ 
nesses and kurtoses, or the multivariate kurtosis) in addition 
to the structural model. 


SIMULATION STUDY 

In this section, we use MC simulation to examine the MC 

(f) 

method and the properties of test statistics T m i, T rm i, T m [, and 
T rnd . Our main interests are (a) how reliable the MC method 
is in controlling Type I errors when compared to the con¬ 
ventional methods, and (b) which statistic is most powerful 
with typically non-normally distributed data in practice. 


Design 

The population is generated according to a confirmatory 
factor model with nine variables and three factors. The 
path diagram of the model is given in Figure 1, where 
factor loadings X 2 , k 3 , k 5 , Xq, kg, and kg, the nine error 
variances, and the three factor variances are all set at 1 . 0 ; 
and the three factor covariances are c/> n = -5, ^ 13 = .3, and 
^23 = -4. The particular values of the parameters in the 
population are not material because, for a given A, the 
power of a test statistic is mainly determined by the size of 
misspecification and the population distribution as mea¬ 
sured by the effect sizes in Equation 7 to 10. The three 
loadings represented by the dashed lines in Figure 1 are all 
equal (a = b = c), and are set as 0 for the condition of Hq, 
and are set as .2, .4, and . 6 , respectively for three condi¬ 
tions of H a . Thus, there are four different population 
covariance matrices £ in the study. 



FIGURE 1 A path diagram for the model that generated the population. 
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Two population distributions are included in the study. 
The first is multivariate normal, and the second is generated 
according to 

x = ft + /u, 

where E 1 / 2 is a symmetric matrix such that E 1 / 2 ^ 1 / 2 = £; 

z = (Z1,Z2,Z3,Z4,Z5,Z6,Z7,Z S ,Z9)' with Zi, z 2 , z 4 , z 5 , z 7 , z 8 

following the standard normal distribution N(0, 1); Z3 fol¬ 
lowing standardized x 2 , z 6 following standardized x 2 , andz 9 
following standardized x 2 ; VxfT^; and -1 to z 9 and u are 
all independent. Because E( 3/x 2 ) = 1 and 

Cov(x) = E(l/u 2 )~L = £, the normal and non-normal 
population distributions have the same covariance matrix. 
With additional statistical and algebraic computation fol¬ 
lowing Equation 2, it can be shown that the relative kurtosis 
for the non-normal population is ft ss 3.558. Thus, we 
would expect T m / to have a higher Type I error rate than 
the nominal level when compared against x 2 / 7 for inference. 
We would also expect the power values of T m i and T rm i to 
be smaller than those of T <2 'j and 7^ z . 

At each of the population conditions (four conditions of 
2 and two conditions of population distribution), we have 
eight conditions on sample size N = 100, 150, 200, 300, 
500, 1,000, 2,000, 3,000, which can be regarded as from 
small to large. 

A condition of incomplete data is also included. For 
each sample, variables X \, x 2 , x 4 , xs, x 7 , and x 8 are always 
observed. Variables X3, xe, and xg contain missing values 
that are generated according to: X3 is missing if 
(xi + x 2 )<ci, X 6 is missing if (x 4 + xs)<c 2 , and x 9 is miss¬ 
ing if (x 7 +x 8 )<C3, where the values of c\, c 2 , and C3 are 
controlled so that X3, X6, and xg each is missing 10%. Thus, 
there are eight observed patterns, and all the non-observed 
values are missing at random (MAR; Rubin, 1976). 

The same nine-variable, three-factor model, as implied by 
the path diagram in Figure 1, is used to fit all the samples in 
our study, where the three dashed lines are not included in 
the model. Thus, the nominal degrees of freedom for the four 
statistics are df = 24. Both NML and robust methods are 
used to estimate the model. For each complete sample, NML 
is carried out via the Fisher-scoring (FS) algorithm in esti¬ 
mating the parameters 6 in the structural model E( 0), and 
statistics T m \ and T rm i are subsequently obtained. The robust 
method using Fluber-type weights in Equation 6 with r 2 
being set at the 95th quantile of x 9 is used to obtain S, the 
FS algorithm is then used to obtain the estimate of the 

parameters in the structural model, and statistics T^j and 

(V) 

Kmi are subsequently computed at the robust 0. For each 
incomplete sample, NML is carried out first by the EM 
algorithm to obtain JL and then by the FS algorithm to obtain 
0, and subsequently T m i and T rm f, robust estimation is 


obtained similarly to that for complete data by the Huber- 
type weights, where the tuning parameter r 1 in (6) is set at 
the 95th quantile of x 2 , with p t being the number of observed 
variables in the fth case (Yuan et al., 2016). 

With 1,000 replications 3 of a statistic T ( = T m i, T rm i, 
t)'J or T^j) under Hg, our estimated critical value is 

c. 95 = 7)950), 

aiming to control Type I error at 5%. For each condition of 
//(i or II a , we independently draw another 1,000 replica¬ 
tions of a sample at the same size and consequently 1,000 
replications of the statistic T. Each replicated value is 
compared against the estimated critical value £ 95 . For the 
statistic T replicated under Hg, the proportion that is greater 
than c. 95 is our estimated Type I error. For the T replicated 
under an H a , the proportion that is greater than c. 95 is our 
estimated value of power. 

Our results are arranged in eight tables: two for 
complete and normally distributed data, two for complete 
and non-normally distributed data, two for incomplete and 
normally distributed data, and two for incomplete and non- 
normally distributed data. The first table in each set of two 
contains the estimated critical values and Type I errors, and 
the second table contains the estimated values of power 
corresponding to the three alternative covariance matrices 
with a = b = c = .20, .40, and .60, respectively. For the 
estimated power in Tables 3, 5, 7, and 9, we did not include 
the conditions when estimated values of power for all the 
statistics are 1 . 0 . 


Results 

The upper panel of Table 2 contains the estimated critical 
values of the statistics T„i, Trmi, T$, and T^, with com¬ 
plete and normally distributed data. The values of c .95 tend 
to be greater at smaller N and become close to the value 
c .95 = 36.415 corresponding to the nominal chi-square dis¬ 
tribution x 24 - The averages of the estimated critical values 
across the eight conditions of sample sizes are also reported 
in Table 2, and all are slightly greater than 36.415. 

The lower panel of Table 2 contains the estimated Type I 
errors when compared each statistic against c 95 (left panel) 
and against the estimated critical values c .95 (fight panel). 
Type I errors under c .95 in the left panel tend to be greater 
than those under c 95 in the right panel, especially for the 
statistic Tm,] at smaller N. In contrast, there is little varia¬ 
tion among the Type 1 errors under c. 95 , although Type I 
errors at N = 3,000 tend to be smaller for all the statistics. 


3 One could choose a larger number of replications if the range of the 
involved statistic or its df is large. 
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TABLE 2 

Estimated Critical Values c .95 (Upper Panel) and Type I Errors 
(Lower Panel), Complete Data and Normally Distributed Population 

(df = 24, c ,95 = 36.415) 


N 


T„,i 


Trmi 


r f( r ) 

1 ml 


r r( r ) 

1 rml 

too 


38.242 


39.651 


39.020 


37.693 

150 


37.501 


37.915 


37.538 


36.546 

200 


38.089 


38.378 


37.944 


37.460 

300 


37.011 


37.488 


37.596 


37.148 

500 


35.890 


36.046 


36.364 


36.056 

1,000 


36.632 


37.033 


37.164 


36.704 

2,000 


37.112 


36.956 


37.402 


37.164 

3,000 


37.066 


36.992 


37.212 


36.924 

Average 


37.193 


37.557 


37.530 


36.962 


Compared Against 

C.95 

Compared Against 

c.95 

N 

T m i 

Trmi 

ji r ) 

1 ml 

r r ( r ) 

1 rml 

Tmi 

Trmi 

ji r ) 

1 ml 

r ri r ) 

1 rml 

too 

0.077 

0.094 

0.079 

0.069 

0.053 

0.050 

0.048 

0.054 

150 

0.076 

0.092 

0.079 

0.069 

0.062 

0.072 

0.060 

0.068 

200 

0.072 

0.081 

0.071 

0.060 

0.046 

0.050 

0.053 

0.045 

300 

0.062 

0.070 

0.066 

0.060 

0.054 

0.056 

0.049 

0.051 

500 

0.050 

0.051 

0.051 

0.048 

0.056 

0.059 

0.052 

0.050 

1,000 

0.045 

0.044 

0.050 

0.040 

0.042 

0.038 

0.035 

0.035 

2,000 

0.052 

0.050 

0.054 

0.052 

0.043 

0.047 

0.047 

0.047 

3,000 

0.034 

0.036 

0.043 

0.041 

0.030 

0.032 

0.034 

0.033 

Average 

0.059 

0.065 

0.062 

0.055 

0.048 

0.051 

0.047 

0.048 


All the Type I errors in Table 2 are reasonably close to .05, 
due to a normally distributed population. 

With complete and normally distributed data. Table 3 
contains the estimated values of power when each statistic 


is compared against the critical values c .95 and c .95 in 
Table 2, respectively. The values of power under the clas¬ 
sical method determined by the separation between %] 4 and 
xi 4 (ri) are also included under y} in Table 3. Except at the 
smallest N, the power values by all the methods are rather 
close. When N is small, the values of power under y} tend 
to be greater than those under c .95 but smaller than those 
under c. 95 , especially with the statistic T rm t. Such a phe¬ 
nomenon is closely related to Type I errors in Table 2. It 
follows from the averaged values of power in the last row 
of Table 3 that T^J and Tj^ l are slightly less powerful than 
T m 1 and T rm / due to a normally distributed population. 

Table 4 contains the estimated critical values and Type I 
errors for complete data but with a non-normally distribu¬ 
ted population. The estimated critical values of T m i are 
about three times that of c .95 = 36.415, and the average 
Type I error of T m j under c .95 = 36.415 is .787! The esti¬ 
mated critical value for T rm j at N = 100 is also way above 
c .95 = 36.415, reflecting its greater variation at small N. 
Regardless of the size of the estimated critical values, 
estimated Type I errors of all the statistics under c .95 are 
rather close to .05, although not exactly. 

The results on power with complete data and the non- 
normally distributed population are given in Table 5. 
Because we cannot properly control Type I errors by com¬ 
paring a statistic against c. 95 , as reflected in Table 4, the 
estimated values of power in the left panel of Table 5 are 
not trustworthy. We thus only discuss the results in the right 
panel of Table 5, where substantial differences exist 
between NML and the robust method. Although Huber- 
type weights by setting the r 2 in Equation 6 at the 95th 


TABLE 3 

Estimated Values of Power by Classical (Left Panel) and Monte Carlo Methods (Right Panel), Complete Data and Normally Distributed 

Population 


Compared Against c .95 Compared Against c .95 


H a 

N 

x 2 

T m i 

Trmi 

T'( r ) 

1 ml 

T'( r ) 

1 rml 

Tmi 

Trmi 

r r( r ) 

1 ml 

ji r ) 

rml 

a = .2 

100 

0.141 

0.161 

0.188 

0.166 

0.146 

0.118 

0.123 

0.109 

0.124 


150 

0.204 

0.238 

0.259 

0.246 

0.210 

0.198 

0.207 

0.202 

0.208 


200 

0.275 

0.279 

0.286 

0.290 

0.254 

0.230 

0.235 

0.238 

0.223 


300 

0.432 

0.436 

0.447 

0.444 

0.421 

0.414 

0.410 

0.391 

0.398 


500 

0.715 

0.712 

0.716 

0.715 

0.702 

0.729 

0.725 

0.716 

0.709 


1,000 

0.980 

0.981 

0.981 

0.983 

0.979 

0.979 

0.978 

0.978 

0.976 

a = .4 

100 

0.514 

0.529 

0.561 

0.529 

0.478 

0.451 

0.433 

0.428 

0.417 


150 

0.749 

0.760 

0.785 

0.765 

0.728 

0.729 

0.730 

0.729 

0.722 


200 

0.892 

0.893 

0.890 

0.896 

0.869 

0.857 

0.847 

0.863 

0.834 


300 

0.987 

0.985 

0.985 

0.982 

0.979 

0.982 

0.981 

0.979 

0.978 

a = .6 

100 

0.884 

0.873 

0.882 

0.870 

0.829 

0.831 

0.814 

0.811 

0.796 


150 

0.985 

0.987 

0.987 

0.987 

0.981 

0.987 

0.984 

0.985 

0.981 


200 

0.999 

0.998 

0.998 

0.998 

0.997 

0.996 

0.996 

0.996 

0.996 

Average 


0.674 

0.679 

0.690 

0.682 

0.659 

0.654 

0.651 

0.648 

0.643 

Note. The values of RMSEA = (F m 

1 ia/df) 1 / 2 corresponding to a = 

.2, .4, and .6 are 

.040, .076, and 

.106, respectively. 
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TABLE 4 

Estimated Critical Values c 95 (Upper Panel) and Type I Errors 
(Lower Panel), Complete Data and Non-normally Distributed 
Population ( df = 24, c.95 = 36.415) 


N 


Tml 


Trmi 


r ri r ) 

1 ml 


r r ( r ) 
rml 

100 


123.962 


52.982 


46.861 


32.472 

150 


77.429 


39.269 


45.057 


33.574 

200 


86.916 


39.528 


45.206 


34.759 

300 


96.830 


36.948 


43.022 


33.971 

500 


103.913 


36.945 


45.076 


36.386 

1,000 


108.315 


36.926 


45.452 


37.522 

2,000 


105.350 


35.683 


44.058 


36.755 

3,000 


107.399 


36.410 


44.211 


37.052 

Average 


101.264 


39.336 


44.868 


35.311 


Compared Against 

c.95 

Compared Against 

c.95 

N 

T m i 

Trmi 

T (r) 

1 ml 

T (r) 

1 rml 

T m i 

Trmi 

rAr) 

1 ml 

rAr) 

1 rml 

100 

0.667 

0.120 

0.205 

0.012 

0.035 

0.033 

0.036 

0.034 

150 

0.706 

0.097 

0.198 

0.018 

0.079 

0.061 

0.049 

0.051 

200 

0.741 

0.068 

0.191 

0.024 

0.047 

0.037 

0.038 

0.043 

300 

0.762 

0.066 

0.209 

0.030 

0.050 

0.057 

0.066 

0.064 

500 

0.795 

0.054 

0.179 

0.039 

0.037 

0.045 

0.039 

0.039 

1,000 

0.869 

0.064 

0.179 

0.047 

0.034 

0.058 

0.033 

0.033 

2,000 

0.874 

0.049 

0.179 

0.051 

0.053 

0.055 

0.046 

0.048 

3,000 

0.881 

0.054 

0.187 

0.056 

0.068 

0.054 

0.047 

0.049 

Average 

0.787 

0.072 

0.191 

0.035 

0.050 

0.050 

0.044 

0.045 


quantile of xl might be far from optimal, the estimated 
values of power for statistics CJ and are way above 
those of T m j and T rm i for many conditions, including a = .2 
and N = 200, 300, and 500; a = .4 and N = 100, 150; and 
a = .6 and N = 100. There is little difference between 
NML and the robust method in statistical power when the 
values of power for TJ and are close to Type I errors 
or when the values of power for T mi and T rm i are close to 
.90. This is expected because no method is able to tell the 
difference between two distributions with a tiny separation 
due to a small sample size or small effect size, and all 
methods would have enough power if the separation 
between two distributions is large enough. 

Table 6 contains the estimated critical values and Type I 
errors with incomplete data from the normally distributed 
population, where all the critical values are above 
c 95 = 36.415, especially those corresponding to T m] and 

T$. Except for 7^ ; , Type I errors for the other three 
statistics under c .95 at N= 100 and 150 are double the 
nominal level of .05. In contrast, the estimated Type I errors 
under c.95 are much closer to .05, although the smallest one 
is .028. 

Table 7 contains the estimated values of power with 
incomplete data and the normally distributed population. 
The values of power among the four statistics under c.95 are 


TABLE 5 

Estimated Values of Power by Classical (Left Panel) and Monte Carlo Methods (Right Panel), Complete Data and Non-Normally Distributed 

Population 


Compared Against c . 9 5 Compared Against c 9 5 


H a 

N 

l 2 

Tml 

Trmi 

rr(r) 

1 ml 

ji r ) 

1 rml 

Tml 

Trmi 

rpir) 

1 ml 

rp{r) 

1 rml 

a = .2 

100 

0.141 

0.761 

0.201 

0.378 

0.045 

0.049 

0.049 

0.093 

0.094 


150 

0.204 

0.831 

0.195 

0.459 

0.098 

0.118 

0.132 

0.168 

0.162 


200 

0.275 

0.861 

0.196 

0.533 

0.150 

0.082 

0.120 

0.215 

0.205 


300 

0.432 

0.924 

0.251 

0.670 

0.336 

0.091 

0.233 

0.438 

0.440 


500 

0.715 

0.965 

0.369 

0.862 

0.620 

0.097 

0.343 

0.627 

0.620 


1,000 

0.980 

1.000 

0.690 

0.993 

0.971 

0.287 

0.666 

0.962 

0.963 


2,000 

1.000 

1.000 

0.942 

1.000 

1.000 

0.796 

0.951 

1.000 

1.000 


3,000 

1.000 

1.000 

0.982 

1.000 

1.000 

0.973 

0.982 

1.000 

1.000 

a = A 

100 

0.514 

0.901 

0.472 

0.716 

0.202 

0.107 

0.132 

0.382 

0.346 


150 

0.749 

0.962 

0.518 

0.857 

0.491 

0.293 

0.422 

0.625 

0.606 


200 

0.892 

0.981 

0.622 

0.939 

0.724 

0.266 

0.501 

0.794 

0.792 


300 

0.987 

0.997 

0.800 

0.993 

0.966 

0.385 

0.787 

0.981 

0.979 


500 

1.000 

1.000 

0.943 

1.000 

1.000 

0.679 

0.938 

1.000 

1.000 


1,000 

1.000 

1.000 

0.997 

1.000 

1.000 

0.994 

0.996 

1.000 

1.000 


2,000 

1.000 

1.000 

0.999 

1.000 

1.000 

1.000 

0.999 

1.000 

1.000 


3,000 

1.000 

1.000 

0.999 

1.000 

1.000 

1.000 

0.999 

1.000 

1.000 

a = .6 

100 

0.884 

0.978 

0.780 

0.944 

0.559 

0.189 

0.320 

0.748 

0.727 


150 

0.985 

0.999 

0.858 

0.987 

0.888 

0.629 

0.790 

0.944 

0.938 


200 

0.999 

1.000 

0.923 

0.998 

0.974 

0.687 

0.873 

0.978 

0.980 


300 

1.000 

1.000 

0.986 

1.000 

0.998 

0.872 

0.985 

0.999 

0.999 


500 

1.000 

1.000 

0.995 

1.000 

1.000 

0.992 

0.995 

1.000 

1.000 

Average 


0.798 

0.960 

0.701 

0.873 

0.715 

0.504 

0.629 

0.760 

0.755 


Note. The values of RMSEA = (F m i„/df)^~ corresponding to a = .2, .4, and .6 are .040, .076, and .106, respectively. 
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TABLE 6 

Estimated Critical Values c 95 (Upper Panel) and Type I Errors 
(Lower Panel), Incomplete Data and Normally Distributed 
Population ( df = 24, c 95 = 36.415) 


N 


T m l 


Trml 


r r( r ) 

1 ml 


r r( r ) 

1 rml 

too 


42.587 


40.698 


42.983 


38.057 

150 


40.394 


37.728 


41.001 


36.985 

200 


41.443 


38.291 


41.345 


37.479 

300 


40.505 


37.798 


40.779 


37.257 

500 


39.573 


37.013 


39.947 


36.497 

1,000 


40.539 


37.564 


40.881 


37.346 

2,000 


40.026 


36.945 


40.560 


37.174 

3,000 


40.074 


36.730 


40.328 


36.947 

Average 


40.643 


37.846 


40.978 


37.218 


Compared Against 

c.95 

Compared Against 

c.95 

N 

T m l 

T rm l 

ji r ) 

1 ml 

r r ( r ) 

1 rml 

T m l 

Trml 

ji r ) 

1 ml 

r ri r ) 

1 rml 

too 

0.139 

0.108 

0.145 

0.078 

0.050 

0.049 

0.050 

0.052 

150 

0.135 

0.101 

0.143 

0.078 

0.074 

0.074 

0.063 

0.069 

200 

0.119 

0.080 

0.127 

0.065 

0.049 

0.055 

0.053 

0.054 

300 

0.109 

0.069 

0.102 

0.057 

0.053 

0.051 

0.053 

0.050 

500 

0.086 

0.050 

0.084 

0.049 

0.050 

0.047 

0.047 

0.048 

1,000 

0.091 

0.045 

0.094 

0.040 

0.032 

0.030 

0.028 

0.027 

2,000 

0.097 

0.045 

0.102 

0.047 

0.039 

0.040 

0.039 

0.041 

3,000 

0.092 

0.047 

0.089 

0.045 

0.036 

0.041 

0.036 

0.036 

Average 

0.109 

0.068 

0.111 

0.057 

0.048 

0.048 

0.046 

0.047 


comparable, with T ml and T rml being slightly more powerful 

than Tl'J and on average. 

Table 8 contains the estimated critical values and Type 1 
errors with incomplete data from the non-normally distrib¬ 
uted population, where the estimated Type I errors for all 


the statistics under c .95 are affected by the sample size, and 
are not trustworthy when N is small. In contrast, Type I 
errors under c .95 are rather close to the nominal level, 
regardless of the values of c .95 in the upper panel. 

Table 9 contains the estimated values of power when the 
four statistics are compared against c .95 and c. 95 , respec¬ 
tively. Again, we cannot trust the estimated values of power 
in the left panel of Table 9 because they are confounded 
with Type I errors. Similar to the results in Table 5, statis- 
(r) (r) 

tics T ml and are much more powerful than T m i and T rm i 
for many conditions, including a = .2 and N = 200, 300, 
500, 1,000; a = .4 and N = 100, 150, 200, 300; and a = .6 
and N = 100, 150, 200. Again, there is little difference 
between the robust method and NML when T m j and T nn i 
have enough power or when T ml and have little power. 

In summary, the MC method developed in this article 
allows us to control Type I errors very well for all the test 
statistics, regardless of whether the population distribution 
is normally distributed or non-normally distributed or 
whether the data are complete or incomplete. With com¬ 
plete or incomplete data from the normal distribution, the 
robust method performs essentially the same as NML. In 
contrast, statistics under the robust method are much more 
powerful than those under NML for the non-normally dis¬ 
tributed population, regardless of whether data are com¬ 
plete or incomplete. Compared across the tables, T^j is 
slightly more powerful than across all the conditions 
on average. In contrast, T m j is slightly more powerful than 
T n „i for the normal population and somewhat less powerful 
than T rm i for the non-normally distributed popidation. 

Note that evaluating power by comparing each statistic 
against c .95 is parallel to the bootstrap and MC methods 


TABLE 7 

Estimated Values of Power by Classical (Left Panel) and Monte Carlo Methods (Right Panel), Incomplete Data and Normally Distributed 

Population 


Compared Against c.95 


Compared Against c.95 


H a 

N 

x 2 

Tml 

Trml 

r r ( r ) 

1 ml 

r r( r ) 

1 rml 

Tml 

Trml 

r ri r ) 

1 ml 

r r( r ) 

rml 

a = .2 

100 

0.141 

0.256 

0.193 

0.257 

0.149 

0.107 

0.105 

0.102 

0.105 


150 

0.204 

0.337 

0.252 

0.348 

0.220 

0.214 

0.218 

0.198 

0.203 


200 

0.275 

0.378 

0.286 

0.385 

0.251 

0.217 

0.230 

0.225 

0.226 


300 

0.432 

0.535 

0.431 

0.543 

0.406 

0.372 

0.374 

0.370 

0.365 


500 

0.715 

0.768 

0.679 

0.764 

0.657 

0.663 

0.656 

0.655 

0.656 


1000 

0.980 

0.984 

0.967 

0.986 

0.966 

0.956 

0.959 

0.955 

0.953 

a = .4 

100 

0.514 

0.602 

0.522 

0.605 

0.436 

0.388 

0.367 

0.373 

0.382 


150 

0.749 

0.790 

0.730 

0.791 

0.682 

0.690 

0.693 

0.675 

0.665 


200 

0.892 

0.922 

0.865 

0.918 

0.835 

0.815 

0.815 

0.822 

0.814 


300 

0.987 

0.987 

0.975 

0.988 

0.971 

0.967 

0.967 

0.964 

0.967 

a = .6 

100 

0.884 

0.896 

0.860 

0.900 

0.799 

0.780 

0.760 

0.763 

0.752 


150 

0.985 

0.989 

0.984 

0.989 

0.975 

0.975 

0.979 

0.969 

0.972 


200 

0.999 

1.000 

0.997 

0.999 

0.996 

0.994 

0.993 

0.994 

0.992 

Average 


0.674 

0.726 

0.672 

0.729 

0.642 

0.626 

0.624 

0.620 

0.619 


Note. The values of RMSEA = (F m i a /df) 1 ^ 2 corresponding to a — .2 .4 and .6 are .040, .076, and .106, respectively. 
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TABLE 8 

Estimated Critical Values c 95 (Upper Panel) and Type I Errors 
(Lower Panel), Incomplete Data and Non-Normally Distributed 
Population ( df = 24, c 95 = 36.415) 


N 


T m i 


Trml 


ji r ) 

1 ml 


r r( r ) 

1 rml 

100 


120.963 


49.944 


50.800 


32.333 

150 


77.787 


38.532 


48.456 


33.768 

200 


76.664 


36.440 


47.809 


34.273 

300 


72.499 


35.127 


46.350 


33.629 

500 


72.899 


35.271 


47.874 


35.771 

1,000 


72.633 


35.540 


48.730 


37.771 

2,000 


70.664 


36.238 


46.818 


36.463 

3,000 


70.972 


35.800 


46.754 


36.184 

Average 


79.385 


37.862 


47.949 


35.024 


Compared Against 

c .95 

Compared Against 

c. 95 

N 

T m i 

Trml 

ji r ) 

1 ml 

r r( r ) 

1 rml 

T m i 

Trml 

ji r ) 

1 ml 

r ri r ) 

1 rml 

too 

0.756 

0.141 

0.320 

0.015 

0.038 

0.043 

0.050 

0.050 

150 

0.790 

0.075 

0.293 

0.018 

0.047 

0.049 

0.058 

0.045 

200 

0.781 

0.046 

0.278 

0.030 

0.040 

0.046 

0.048 

0.052 

300 

0.763 

0.033 

0.289 

0.032 

0.054 

0.044 

0.066 

0.072 

500 

0.753 

0.032 

0.234 

0.035 

0.046 

0.040 

0.042 

0.044 

1,000 

0.789 

0.059 

0.257 

0.051 

0.047 

0.063 

0.037 

0.033 

2,000 

0.764 

0.048 

0.237 

0.052 

0.052 

0.051 

0.046 

0.050 

3,000 

0.784 

0.054 

0.263 

0.045 

0.058 

0.061 

0.043 

0.047 

Average 

0.773 

0.061 

0.271 

0.035 

0.048 

0.050 

0.049 

0.049 


proposed in Yung and Bentler (1996), Muthen and Muthen 
(2002), Schoemann et al. (2014), and Zhang (2014). 
According to Tables 2 and 3, the method controls Type I 
error reasonably well when the underlying population is 
multivariate normal and data are complete. However, the 
method does not work well when data are incomplete, 
especially when the underlying population is not multivari¬ 
ate normal. 


DISCUSSION AND RECOMMENDATION 

In this article we proposed using the MC method for power 
analysis in SEM, and using robust M-estimation to deal 
with non-normally distributed data in practice. The most 
notable feature of the MC method is that it allows us to 
reliably control Type I errors for any statistic for which the 
distribution might not be known. The advantage of a robust 
method is that it programmatically increases the effect size 
when the population distribution contains heavy tails. 
Compared to conventional methods in which power value 
is determined by the separation between and x3y( r l), the 
MC method needs a population from which we simulate 
our data. In practice, a researcher might not know what the 
true population distribution is, but can only specify mar¬ 
ginal skewnesses and kurtoses. Then one can simulate data 


TABLE 9 

Estimated Values of Power by Classical (Left Panel) and Monte Carlo Methods (Right Panel), Incomplete Data and Non-Normally Distributed 

Population 


Compared Against c 9 5 


Compared Against c 9 5 


H a 

N 

x 2 

T,„i 

Trml 

r r( r ) 

1 ml 

r r( r ) 

1 rml 

Tm! 

Trml 

r r( r ) 

1 ml 

r r( r ) 

1 rml 

a = .2 

too 

0.141 

0.814 

0.219 

0.482 

0.053 

0.055 

0.066 

0.106 

0.102 


150 

0.204 

0.856 

0.144 

0.527 

0.073 

0.095 

0.106 

0.157 

0.139 


200 

0.275 

0.877 

0.122 

0.599 

0.122 

0.109 

0.121 

0.207 

0.184 


300 

0.432 

0.921 

0.149 

0.701 

0.286 

0.198 

0.182 

0.376 

0.394 


500 

0.715 

0.956 

0.281 

0.860 

0.536 

0.286 

0.327 

0.546 

0.563 


1,000 

0.980 

0.999 

0.693 

0.993 

0.935 

0.665 

0.720 

0.924 

0.925 


2,000 

1.000 

1.000 

0.973 

1.000 

1.000 

0.977 

0.974 

1.000 

1.000 


3,000 

1.000 

1.000 

0.999 

1.000 

1.000 

0.999 

0.999 

1.000 

1.000 

a = .4 

too 

0.514 

0.923 

0.461 

0.756 

0.180 

0.115 

0.160 

0.330 

0.327 


150 

0.749 

0.968 

0.403 

0.873 

0.407 

0.292 

0.336 

0.569 

0.529 


200 

0.892 

0.981 

0.443 

0.939 

0.657 

0.405 

0.443 

0.746 

0.721 


300 

0.987 

0.996 

0.657 

0.993 

0.925 

0.727 

0.694 

0.962 

0.969 


500 

1.000 

0.999 

0.931 

1.000 

0.999 

0.936 

0.942 

0.999 

0.999 

a = .6 

100 

0.884 

0.979 

0.756 

0.949 

0.511 

0.218 

0.375 

0.679 

0.664 


150 

0.985 

0.997 

0.697 

0.989 

0.829 

0.643 

0.643 

0.902 

0.892 


200 

0.999 

1.000 

0.766 

0.994 

0.954 

0.819 

0.766 

0.972 

0.968 


300 

1.000 

1.000 

0.956 

1.000 

0.997 

0.982 

0.965 

0.997 

0.997 


500 

1.000 

1.000 

1.000 

1.000 

1.000 

0.998 

1.000 

1.000 

1.000 

Average 


0.764 

0.959 

0.592 

0.870 

0.637 

0.529 

0.546 

0.693 

0.687 


Note. The values of RMSEA = [C m i a /df y 1 corresponding to a = .2, .4, and .6 are .040, .076, and .106, respectively. 
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using the method as described in Vale and Maurelli (1983), 
Nagahara (2004), or Auerswald and Moshagen (2015). In 
the Appendix, we describe a method that allows the control 
of the relative multivariate kurtosis in addition to marginal 
skewnesses and kurtoses. 

When nonnormally distributed data are expected and a 
researcher can approximately specify the marginal skew¬ 
nesses and kurtoses or multivariate kurtosis, the MC 
method including the features of non-normality is recom¬ 
mended. In practice, information on skewness and kurtosis 
can be obtained from meta-analysis on distributions of real 
data (e.g., Blanca et ah, 2013; Cain et ah, in press) or by 
searching the Web or contacting authors for data that are 
collected using the same measurement scales or on the 
same research topic. When a sample with a moderate size 
has already been collected, then one could execute the MC 
method by drawing samples from the empirical distribution 
of the sample (nonparametric bootstrap). If a researcher has 
no idea of the population distribution, then use the chi- 
square-based method when p is relatively small, and the 
MC method with normally distributed population when p is 
relatively large. However, power and sample size deter¬ 
mined by such methods might not be reliable when the 
data to be collected turn out to be non-normally distributed 
with heavy tails. As for test statistics, we recommend using 
rj ( T m i following a robust method) with non-normally 
distributed data, and use T mi when one has confidence 
that the population is normally distributed. The MC meth¬ 
odology as developed in this article together with the robust 
methods will be available via an online statistical package, 
which allows researchers to conduct power analysis by 
specifying marginal skewness and kurtoses as well as the 
relative multivariate kurtosis. 

For the simulation study in the previous section, we used 
the robust method with Huber-type weights corresponding 
to down-weighting 5% of a normally distributed popula¬ 
tion. The robust method might be refined by adjusting the 
tuning parameter r in Equation 6. In particular, one can 
choose r corresponding to the empirically most efficient 
parameter estimates using MC or the bootstrap methodol¬ 
ogy (Yuan et ah, 2004). The most efficient parameter esti¬ 
mates correspond to the smallest possible /?,. in Equation 9 
and consequently most powerful test statistic. 

Because the values of effect sizes defined in Equations 7 
to 10 are simply measures of model misspecification, there 
might be a conflict of interest between researchers who like 
to have a test statistic with better power and those who are 
interested in a model that might not fit their data as well but 
allows them to elaborate on the substantive variables. A 
robust method might yield more efficient parameter esti¬ 
mates but at the same time the test statistics Tj or will 
make an already poor model even poorer, due to greater 
effect sizes. Although researchers might not choose a test 
statistic with a better power, we still recommend using a 


robust method in real data analysis. This is because a 
seemingly poor model might be due to outliers or a small 
proportion of observations that do not fit the model well, 
and robust methods will down-weight the influence of these 
observations and return the model with the credit it 
deserves. 
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APPENDIX 

Simulation Data with Desired Multivariate Kurtosis 

This appendix describes a method for simulating non- 
normally distributed data from a distribution with given 
marginal skewnesses and kurtoses as well as multivariate 
kurtosis. Let p and 2 be the target population mean vector 
and covariance matrix, respectively. There is a symmetric 
matrix A = (ay) = 2 1 / 2 such that AA' = 2. Our model 
for simulating non-normally distributed data is 

x = fi + uAz , (A.l) 

where z = (z\,Z 2 , ■ ■ • ,z p )' with E(zj ) = 0, E(zj ) = 1.0, 
E(zj) = skewj, and E(z 4 ) = kurtj + 3, and u is a standardized 
random variable with E(ir) = 1, /?(w 3 ) = y 3 , E(u 4 ) = y 4 , 
and is independent with z. The way of simulating data accord¬ 
ing to Equation A.l is a special case of a method proposed in 
Yuan and Bentler (1997a). It can be shown that the marginal 
skewness and kurtosis of the x in Equation A. 1 are given by 

p n 

skew(jq) = y 3 ^ a\ skew //erf/ 2 , 

7=1 

1=1,2,...,/*; (A.2) 

P 

kurt(xi) = y 4 ^ a 4 kurty/a 2 . + 3(y 4 - 1), 

7=1 

i = l,2,...,/*; (A.3) 

and the relative multivariate kurtosis is given by 


^ = Y 4 |l + ^kurty/>(p + 2)] j. (A.4) 

Note that there are 2p + 1 Equations in A.2, A.3 and A.4. 
The coefficients ay are determined via the covariance 
matrix 2. The left sides of Equation A.2, A.3, and A.4 
are the marginal skewnesses, kurtoses, and relative multi¬ 
variate kurtosis to be specified, and the right sides of A.2 to 
A.4 contain 2p + 2 unknown quantities: ( skewj,kurtj ), 
y=l, 2, ..., p, y 3 , and y 4 . Because the number of 
unknowns is more than the number of Equations, there 
can be more than one set of skewj and kurtj, j = 1 , 2 , 
...,/>; and y 3 and y 4 that satisfies A.2 to A.4, which can 
be solved numerically. 

Once skewj and kurtj, j = 1, 2, ...,/, are obtained, 
standardized Zj can be obtained by the power transforma¬ 
tion method of Fleishman (1978; see also Tadikamalla, 
1980) and so is the random variable u. 

Note that the values of skewness and kurtosis of x 
cannot be arbitrarily chosen because moments of random 
variables need to satisfy certain conditions (see, e.g.. 
Yuan & Bentler, 1997a). If Equations A.2, A.3, or A.4 
do not have a set of solutions for the required values of 
s kew(xi) and kurt(xi), then it is most likely that the 
specified values are inadmissible in the sense that no 
pupulation distribution satisfies such conditions. 
Similarly, even if skewj and kurtj are obtained, there is 
still a possibility that the method in Fleishman (1978) or 
Tadikamalla (1980) cannot generate random z; that 
satisfy such desired values. In such a case, there is a 
need to respecify these values. 


